Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fsspec DataPipe to accept extra keyword arguments #495

Closed
wants to merge 1 commit into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Jun 3, 2022

Fixes #494

Changes

  • Add kwargs to all DataPipes using fsspec
  • Fix a bug in FSSpecFileLister to prevent joining paths from the root path if the root path has already presented in the paths.
  • Add/Fix more tests

I need this PR to test the performance between fsspec and native s3

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2022
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan requested review from NivekT, VitalyFedyunin and d4l3k June 3, 2022 20:56
Comment on lines +96 to +98
else:
abs_path = file_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm - this never happens if is_local == True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I can't guarantee it since I am not aware of all use cases.Based on my experience on fsspec, I only encounter the problem when the input is s3 url.
For local files, I would trust the test here https://github.com/pytorch/data/blob/main/test/test_fsspec.py#L56-L65. As long as it doesn't break, I think

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan mentioned this pull request Jun 7, 2022
ejguan added a commit to ejguan/data that referenced this pull request Jun 7, 2022
Summary:
Fixes pytorch#494

### Changes

- Add `kwargs` to all `DataPipe`s using `fsspec`
- Fix a bug in `FSSpecFileLister` to prevent joining paths from the root path if the root path has already presented in the paths.
- Add/Fix more tests

I need this PR to test the performance between `fsspec` and native `s3`

Pull Request resolved: pytorch#495

Reviewed By: NivekT

Differential Revision: D36908570

Pulled By: ejguan

fbshipit-source-id: 93da4c1c7e18012fb053799265375124d28a856c
ejguan added a commit that referenced this pull request Jun 7, 2022
Summary:
Fixes #494

### Changes

- Add `kwargs` to all `DataPipe`s using `fsspec`
- Fix a bug in `FSSpecFileLister` to prevent joining paths from the root path if the root path has already presented in the paths.
- Add/Fix more tests

I need this PR to test the performance between `fsspec` and native `s3`

Pull Request resolved: #495

Reviewed By: NivekT

Differential Revision: D36908570

Pulled By: ejguan

fbshipit-source-id: 93da4c1c7e18012fb053799265375124d28a856c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fsspec requires to delegate kwargs to FileSystem to list/load file correctly
3 participants